Overview

Dataset statistics

Number of variables17
Number of observations173216
Missing cells173223
Missing cells (%)5.9%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory22.5 MiB
Average record size in memory136.0 B

Variable types

Numeric9
Categorical7
Unsupported1

Warnings

rate_code_group has constant value "Others" Constant
pickup_datetime has a high cardinality: 38803 distinct values High cardinality
dropoff_datetime has a high cardinality: 104237 distinct values High cardinality
pickup_longitude is highly correlated with pickup_latitudeHigh correlation
pickup_latitude is highly correlated with pickup_longitudeHigh correlation
trip_distance is highly correlated with fare_amountHigh correlation
fare_amount is highly correlated with trip_distanceHigh correlation
tip_amount is highly correlated with tip_paidHigh correlation
tip_paid is highly correlated with tip_amountHigh correlation
pickup_longitude is highly correlated with pickup_latitudeHigh correlation
pickup_latitude is highly correlated with pickup_longitudeHigh correlation
dropoff_longitude is highly correlated with dropoff_latitudeHigh correlation
dropoff_latitude is highly correlated with dropoff_longitudeHigh correlation
trip_distance is highly correlated with fare_amountHigh correlation
fare_amount is highly correlated with trip_distanceHigh correlation
tip_amount is highly correlated with tip_paidHigh correlation
tip_paid is highly correlated with tip_amountHigh correlation
trip_distance is highly correlated with fare_amountHigh correlation
fare_amount is highly correlated with trip_distanceHigh correlation
tip_amount is highly correlated with tip_paidHigh correlation
tip_paid is highly correlated with tip_amountHigh correlation
pickup_latitude is highly correlated with pickup_longitudeHigh correlation
fare_amount is highly correlated with trip_distanceHigh correlation
payment_type is highly correlated with tip_paidHigh correlation
dropoff_longitude is highly correlated with dropoff_latitudeHigh correlation
tip_paid is highly correlated with payment_typeHigh correlation
dropoff_latitude is highly correlated with dropoff_longitudeHigh correlation
pickup_longitude is highly correlated with pickup_latitudeHigh correlation
trip_distance is highly correlated with fare_amountHigh correlation
payment_type is highly correlated with tip_paid and 1 other fieldsHigh correlation
tip_paid is highly correlated with payment_type and 1 other fieldsHigh correlation
rate_code_group is highly correlated with payment_type and 3 other fieldsHigh correlation
pickup_year is highly correlated with rate_code_groupHigh correlation
vendor_id is highly correlated with rate_code_groupHigh correlation
rate_code has 173216 (100.0%) missing values Missing
df_index has unique values Unique
rate_code is an unsupported type, check if it needs cleaning or further analysis Unsupported
tip_amount has 132721 (76.6%) zeros Zeros

Reproduction

Analysis started2021-07-02 10:14:03.150965
Analysis finished2021-07-02 10:14:47.528069
Duration44.38 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

df_index
Real number (ℝ≥0)

UNIQUE

Distinct173216
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean536501.5263
Minimum25
Maximum1071895
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.3 MiB
2021-07-02T18:14:47.737508image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum25
5-th percentile53300.5
Q1267699
median535609.5
Q3806440.5
95-th percentile1018794.5
Maximum1071895
Range1071870
Interquartile range (IQR)538741.5

Descriptive statistics

Standard deviation310189.4109
Coefficient of variation (CV)0.5781706028
Kurtosis-1.206061358
Mean536501.5263
Median Absolute Deviation (MAD)269417
Skewness-0.0004635810034
Sum9.293064838 × 1010
Variance9.621747063 × 1010
MonotonicityStrictly increasing
2021-07-02T18:14:47.926003image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
5263351
 
< 0.1%
8266811
 
< 0.1%
7406751
 
< 0.1%
7386261
 
< 0.1%
10048641
 
< 0.1%
3064951
 
< 0.1%
7967471
 
< 0.1%
484451
 
< 0.1%
5706841
 
< 0.1%
2962501
 
< 0.1%
Other values (173206)173206
> 99.9%
ValueCountFrequency (%)
251
< 0.1%
331
< 0.1%
341
< 0.1%
381
< 0.1%
461
< 0.1%
471
< 0.1%
491
< 0.1%
571
< 0.1%
601
< 0.1%
611
< 0.1%
ValueCountFrequency (%)
10718951
< 0.1%
10718811
< 0.1%
10718751
< 0.1%
10718561
< 0.1%
10718451
< 0.1%
10718361
< 0.1%
10718211
< 0.1%
10718031
< 0.1%
10717981
< 0.1%
10717901
< 0.1%

vendor_id
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.3 MiB
VTS
86515 
CMT
77254 
DDS
9447 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters519648
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowVTS
2nd rowCMT
3rd rowVTS
4th rowVTS
5th rowVTS

Common Values

ValueCountFrequency (%)
VTS86515
49.9%
CMT77254
44.6%
DDS9447
 
5.5%

Length

2021-07-02T18:14:48.299037image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-07-02T18:14:48.418254image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
vts86515
49.9%
cmt77254
44.6%
dds9447
 
5.5%

Most occurring characters

ValueCountFrequency (%)
T163769
31.5%
S95962
18.5%
V86515
16.6%
C77254
14.9%
M77254
14.9%
D18894
 
3.6%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter519648
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
T163769
31.5%
S95962
18.5%
V86515
16.6%
C77254
14.9%
M77254
14.9%
D18894
 
3.6%

Most occurring scripts

ValueCountFrequency (%)
Latin519648
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
T163769
31.5%
S95962
18.5%
V86515
16.6%
C77254
14.9%
M77254
14.9%
D18894
 
3.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII519648
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
T163769
31.5%
S95962
18.5%
V86515
16.6%
C77254
14.9%
M77254
14.9%
D18894
 
3.6%

pickup_datetime
Categorical

HIGH CARDINALITY

Distinct38803
Distinct (%)22.4%
Missing0
Missing (%)0.0%
Memory size1.3 MiB
2009-04-23 19:28:00+00:00
 
214
2009-10-21 19:31:00+00:00
 
205
2009-03-10 18:28:00+00:00
 
202
2009-04-07 19:04:00+00:00
 
202
2009-01-24 20:08:00+00:00
 
200
Other values (38798)
172193 

Length

Max length25
Median length25
Mean length25
Min length25

Characters and Unicode

Total characters4330400
Distinct characters14
Distinct categories5 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique13832 ?
Unique (%)8.0%

Sample

1st row2009-11-23 15:45:00+00:00
2nd row2009-10-10 01:29:17+00:00
3rd row2009-10-01 19:31:00+00:00
4th row2009-03-20 01:53:00+00:00
5th row2009-12-14 18:34:00+00:00

Common Values

ValueCountFrequency (%)
2009-04-23 19:28:00+00:00214
 
0.1%
2009-10-21 19:31:00+00:00205
 
0.1%
2009-03-10 18:28:00+00:00202
 
0.1%
2009-04-07 19:04:00+00:00202
 
0.1%
2009-01-24 20:08:00+00:00200
 
0.1%
2009-01-31 18:40:00+00:00199
 
0.1%
2009-02-26 18:47:00+00:00196
 
0.1%
2009-06-23 18:52:00+00:00196
 
0.1%
2009-01-26 19:26:00+00:00195
 
0.1%
2009-10-01 19:31:00+00:00194
 
0.1%
Other values (38793)171213
98.8%

Length

2021-07-02T18:14:48.762303image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2009-04-231136
 
0.3%
2009-08-071077
 
0.3%
2009-02-261057
 
0.3%
2009-10-271037
 
0.3%
2009-08-181031
 
0.3%
2009-05-221016
 
0.3%
2009-05-16972
 
0.3%
2009-07-15953
 
0.3%
2009-03-14939
 
0.3%
2009-09-17902
 
0.3%
Other values (31144)336312
97.1%

Most occurring characters

ValueCountFrequency (%)
01562784
36.1%
:519648
 
12.0%
2398327
 
9.2%
-346432
 
8.0%
1336629
 
7.8%
9250467
 
5.8%
173216
 
4.0%
+173216
 
4.0%
3124425
 
2.9%
5113171
 
2.6%
Other values (4)332085
 
7.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number3117888
72.0%
Other Punctuation519648
 
12.0%
Dash Punctuation346432
 
8.0%
Space Separator173216
 
4.0%
Math Symbol173216
 
4.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
01562784
50.1%
2398327
 
12.8%
1336629
 
10.8%
9250467
 
8.0%
3124425
 
4.0%
5113171
 
3.6%
4110821
 
3.6%
879325
 
2.5%
775045
 
2.4%
666894
 
2.1%
Dash Punctuation
ValueCountFrequency (%)
-346432
100.0%
Space Separator
ValueCountFrequency (%)
173216
100.0%
Other Punctuation
ValueCountFrequency (%)
:519648
100.0%
Math Symbol
ValueCountFrequency (%)
+173216
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common4330400
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
01562784
36.1%
:519648
 
12.0%
2398327
 
9.2%
-346432
 
8.0%
1336629
 
7.8%
9250467
 
5.8%
173216
 
4.0%
+173216
 
4.0%
3124425
 
2.9%
5113171
 
2.6%
Other values (4)332085
 
7.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII4330400
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
01562784
36.1%
:519648
 
12.0%
2398327
 
9.2%
-346432
 
8.0%
1336629
 
7.8%
9250467
 
5.8%
173216
 
4.0%
+173216
 
4.0%
3124425
 
2.9%
5113171
 
2.6%
Other values (4)332085
 
7.7%

dropoff_datetime
Categorical

HIGH CARDINALITY

Distinct104237
Distinct (%)60.2%
Missing0
Missing (%)0.0%
Memory size1.3 MiB
2009-12-04 21:19:00+00:00
 
31
2009-12-04 21:21:00+00:00
 
27
2009-02-21 22:32:00+00:00
 
26
2009-03-10 18:34:00+00:00
 
25
2009-10-27 22:55:00+00:00
 
25
Other values (104232)
173082 

Length

Max length25
Median length25
Mean length25
Min length25

Characters and Unicode

Total characters4330400
Distinct characters14
Distinct categories5 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique90736 ?
Unique (%)52.4%

Sample

1st row2009-11-23 15:54:00+00:00
2nd row2009-10-10 01:45:40+00:00
3rd row2009-10-01 19:45:00+00:00
4th row2009-03-20 01:59:00+00:00
5th row2009-12-14 18:41:00+00:00

Common Values

ValueCountFrequency (%)
2009-12-04 21:19:00+00:0031
 
< 0.1%
2009-12-04 21:21:00+00:0027
 
< 0.1%
2009-02-21 22:32:00+00:0026
 
< 0.1%
2009-03-10 18:34:00+00:0025
 
< 0.1%
2009-10-27 22:55:00+00:0025
 
< 0.1%
2009-11-10 08:47:00+00:0025
 
< 0.1%
2009-12-04 21:18:00+00:0024
 
< 0.1%
2009-01-24 20:13:00+00:0024
 
< 0.1%
2009-10-27 22:56:00+00:0024
 
< 0.1%
2009-10-24 21:42:00+00:0023
 
< 0.1%
Other values (104227)172962
99.9%

Length

2021-07-02T18:14:49.144279image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2009-04-231140
 
0.3%
2009-08-071073
 
0.3%
2009-02-261057
 
0.3%
2009-10-271036
 
0.3%
2009-08-181029
 
0.3%
2009-05-221015
 
0.3%
2009-05-16972
 
0.3%
2009-07-15954
 
0.3%
2009-03-14946
 
0.3%
2009-09-17900
 
0.3%
Other values (52690)336310
97.1%

Most occurring characters

ValueCountFrequency (%)
01563182
36.1%
:519648
 
12.0%
2397029
 
9.2%
-346432
 
8.0%
1337701
 
7.8%
9252740
 
5.8%
173216
 
4.0%
+173216
 
4.0%
3126106
 
2.9%
5112560
 
2.6%
Other values (4)328570
 
7.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number3117888
72.0%
Other Punctuation519648
 
12.0%
Dash Punctuation346432
 
8.0%
Space Separator173216
 
4.0%
Math Symbol173216
 
4.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
01563182
50.1%
2397029
 
12.7%
1337701
 
10.8%
9252740
 
8.1%
3126106
 
4.0%
5112560
 
3.6%
4110777
 
3.6%
876548
 
2.5%
773233
 
2.3%
668012
 
2.2%
Dash Punctuation
ValueCountFrequency (%)
-346432
100.0%
Space Separator
ValueCountFrequency (%)
173216
100.0%
Other Punctuation
ValueCountFrequency (%)
:519648
100.0%
Math Symbol
ValueCountFrequency (%)
+173216
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common4330400
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
01563182
36.1%
:519648
 
12.0%
2397029
 
9.2%
-346432
 
8.0%
1337701
 
7.8%
9252740
 
5.8%
173216
 
4.0%
+173216
 
4.0%
3126106
 
2.9%
5112560
 
2.6%
Other values (4)328570
 
7.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII4330400
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
01563182
36.1%
:519648
 
12.0%
2397029
 
9.2%
-346432
 
8.0%
1337701
 
7.8%
9252740
 
5.8%
173216
 
4.0%
+173216
 
4.0%
3126106
 
2.9%
5112560
 
2.6%
Other values (4)328570
 
7.6%

pickup_longitude
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct48301
Distinct (%)27.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-73.97689252
Minimum-73.999999
Maximum-73.900044
Zeros0
Zeros (%)0.0%
Negative173216
Negative (%)100.0%
Memory size1.3 MiB
2021-07-02T18:14:49.326831image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum-73.999999
5-th percentile-73.99608725
Q1-73.987858
median-73.97946
Q3-73.968017
95-th percentile-73.95190375
Maximum-73.900044
Range0.099955
Interquartile range (IQR)0.019841

Descriptive statistics

Standard deviation0.01427508304
Coefficient of variation (CV)-0.000192966784
Kurtosis0.8538684583
Mean-73.97689252
Median Absolute Deviation (MAD)0.009428
Skewness0.830872692
Sum-12813981.41
Variance0.0002037779957
MonotonicityNot monotonic
2021-07-02T18:14:49.507343image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-73.9822536
 
< 0.1%
-73.98217930
 
< 0.1%
-73.98191830
 
< 0.1%
-73.9820630
 
< 0.1%
-73.98203728
 
< 0.1%
-73.98151526
 
< 0.1%
-73.98049725
 
< 0.1%
-73.98184725
 
< 0.1%
-73.98210825
 
< 0.1%
-73.98198924
 
< 0.1%
Other values (48291)172937
99.8%
ValueCountFrequency (%)
-73.9999991
 
< 0.1%
-73.9999985
< 0.1%
-73.9999974
< 0.1%
-73.9999961
 
< 0.1%
-73.9999954
< 0.1%
-73.9999935
< 0.1%
-73.9999922
 
< 0.1%
-73.9999911
 
< 0.1%
-73.999992
 
< 0.1%
-73.9999892
 
< 0.1%
ValueCountFrequency (%)
-73.9000441
< 0.1%
-73.9001471
< 0.1%
-73.9002051
< 0.1%
-73.9002271
< 0.1%
-73.9003671
< 0.1%
-73.9004051
< 0.1%
-73.9004531
< 0.1%
-73.9005321
< 0.1%
-73.9008871
< 0.1%
-73.9008951
< 0.1%

pickup_latitude
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct61265
Distinct (%)35.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean40.75720439
Minimum40.700005
Maximum40.799997
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.3 MiB
2021-07-02T18:14:49.709766image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum40.700005
5-th percentile40.724768
Q140.745138
median40.758152
Q340.76991825
95-th percentile40.78568225
Maximum40.799997
Range0.099992
Interquartile range (IQR)0.02478025

Descriptive statistics

Standard deviation0.01803157869
Coefficient of variation (CV)0.0004424145119
Kurtosis-0.4393484571
Mean40.75720439
Median Absolute Deviation (MAD)0.012379
Skewness-0.1779794106
Sum7059799.916
Variance0.0003251378299
MonotonicityNot monotonic
2021-07-02T18:14:49.913255image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
40.7628122
 
< 0.1%
40.76293522
 
< 0.1%
40.76417322
 
< 0.1%
40.76406522
 
< 0.1%
40.75943721
 
< 0.1%
40.7512420
 
< 0.1%
40.76343820
 
< 0.1%
40.74985819
 
< 0.1%
40.75030719
 
< 0.1%
40.74980519
 
< 0.1%
Other values (61255)173010
99.9%
ValueCountFrequency (%)
40.7000051
< 0.1%
40.7000431
< 0.1%
40.7000761
< 0.1%
40.700141
< 0.1%
40.7002951
< 0.1%
40.7003211
< 0.1%
40.7003451
< 0.1%
40.7003821
< 0.1%
40.7004281
< 0.1%
40.7004371
< 0.1%
ValueCountFrequency (%)
40.7999971
< 0.1%
40.7999831
< 0.1%
40.7999811
< 0.1%
40.7999781
< 0.1%
40.7999771
< 0.1%
40.7999752
< 0.1%
40.7999731
< 0.1%
40.7999681
< 0.1%
40.7999631
< 0.1%
40.7999622
< 0.1%

dropoff_longitude
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION

Distinct50541
Distinct (%)29.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-73.97561718
Minimum-73.999999
Maximum-73.900017
Zeros0
Zeros (%)0.0%
Negative173216
Negative (%)100.0%
Memory size1.3 MiB
2021-07-02T18:14:50.092776image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum-73.999999
5-th percentile-73.99572525
Q1-73.987
median-73.97839
Q3-73.966923
95-th percentile-73.949565
Maximum-73.900017
Range0.099982
Interquartile range (IQR)0.020077

Descriptive statistics

Standard deviation0.01535660968
Coefficient of variation (CV)-0.0002075901529
Kurtosis1.753215749
Mean-73.97561718
Median Absolute Deviation (MAD)0.009578
Skewness1.059273186
Sum-12813760.5
Variance0.0002358254609
MonotonicityNot monotonic
2021-07-02T18:14:50.272290image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-73.98236828
 
< 0.1%
-73.99122825
 
< 0.1%
-73.99101524
 
< 0.1%
-73.99082523
 
< 0.1%
-73.99167823
 
< 0.1%
-73.98329223
 
< 0.1%
-73.97971523
 
< 0.1%
-73.98814822
 
< 0.1%
-73.9820622
 
< 0.1%
-73.98101822
 
< 0.1%
Other values (50531)172981
99.9%
ValueCountFrequency (%)
-73.9999992
 
< 0.1%
-73.9999985
< 0.1%
-73.9999972
 
< 0.1%
-73.9999955
< 0.1%
-73.9999941
 
< 0.1%
-73.9999936
< 0.1%
-73.9999922
 
< 0.1%
-73.999991
 
< 0.1%
-73.9999892
 
< 0.1%
-73.9999884
< 0.1%
ValueCountFrequency (%)
-73.9000171
< 0.1%
-73.9000371
< 0.1%
-73.9000721
< 0.1%
-73.9000741
< 0.1%
-73.9000891
< 0.1%
-73.9000921
< 0.1%
-73.9001351
< 0.1%
-73.9001631
< 0.1%
-73.9001661
< 0.1%
-73.9001781
< 0.1%

dropoff_latitude
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct62864
Distinct (%)36.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean40.75764219
Minimum40.700009
Maximum40.799998
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.3 MiB
2021-07-02T18:14:50.463750image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum40.700009
5-th percentile40.72419875
Q140.745988
median40.7585425
Q340.77070825
95-th percentile40.7871655
Maximum40.799998
Range0.099989
Interquartile range (IQR)0.02472025

Descriptive statistics

Standard deviation0.01858345416
Coefficient of variation (CV)0.0004559501767
Kurtosis-0.2885797394
Mean40.75764219
Median Absolute Deviation (MAD)0.0123425
Skewness-0.232906668
Sum7059875.749
Variance0.0003453447684
MonotonicityNot monotonic
2021-07-02T18:14:50.645296image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
40.75012823
 
< 0.1%
40.75556320
 
< 0.1%
40.76404720
 
< 0.1%
40.76252319
 
< 0.1%
40.75025319
 
< 0.1%
40.75617319
 
< 0.1%
40.76264818
 
< 0.1%
40.76325818
 
< 0.1%
40.7639418
 
< 0.1%
40.76074718
 
< 0.1%
Other values (62854)173024
99.9%
ValueCountFrequency (%)
40.7000091
< 0.1%
40.7000281
< 0.1%
40.7000381
< 0.1%
40.7000412
< 0.1%
40.7000471
< 0.1%
40.7000551
< 0.1%
40.7000691
< 0.1%
40.7000971
< 0.1%
40.7001081
< 0.1%
40.7001351
< 0.1%
ValueCountFrequency (%)
40.7999982
< 0.1%
40.7999951
< 0.1%
40.7999921
< 0.1%
40.799991
< 0.1%
40.7999852
< 0.1%
40.7999811
< 0.1%
40.799982
< 0.1%
40.7999722
< 0.1%
40.799971
< 0.1%
40.7999681
< 0.1%

rate_code
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing173216
Missing (%)100.0%
Memory size1.3 MiB

passenger_count
Real number (ℝ≥0)

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.691217901
Minimum1
Maximum6
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.3 MiB
2021-07-02T18:14:50.801846image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q32
95-th percentile5
Maximum6
Range5
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.259826671
Coefficient of variation (CV)0.7449227387
Kurtosis2.203046654
Mean1.691217901
Median Absolute Deviation (MAD)0
Skewness1.861627547
Sum292946
Variance1.58716324
MonotonicityNot monotonic
2021-07-02T18:14:50.949450image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
1117931
68.1%
227615
 
15.9%
515367
 
8.9%
37854
 
4.5%
43653
 
2.1%
6796
 
0.5%
ValueCountFrequency (%)
1117931
68.1%
227615
 
15.9%
37854
 
4.5%
43653
 
2.1%
515367
 
8.9%
6796
 
0.5%
ValueCountFrequency (%)
6796
 
0.5%
515367
 
8.9%
43653
 
2.1%
37854
 
4.5%
227615
 
15.9%
1117931
68.1%

trip_distance
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct936
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.772427374
Minimum0.01
Maximum14.2
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.3 MiB
2021-07-02T18:14:51.124982image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0.01
5-th percentile0.5
Q10.91
median1.46
Q32.3
95-th percentile4.1
Maximum14.2
Range14.19
Interquartile range (IQR)1.39

Descriptive statistics

Standard deviation1.192034872
Coefficient of variation (CV)0.6725437045
Kurtosis4.504261855
Mean1.772427374
Median Absolute Deviation (MAD)0.62
Skewness1.732764087
Sum307012.78
Variance1.420947136
MonotonicityNot monotonic
2021-07-02T18:14:51.306527image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
15016
 
2.9%
0.94991
 
2.9%
0.84985
 
2.9%
1.14722
 
2.7%
0.74568
 
2.6%
1.24558
 
2.6%
1.34360
 
2.5%
0.64057
 
2.3%
1.44043
 
2.3%
1.53740
 
2.2%
Other values (926)128176
74.0%
ValueCountFrequency (%)
0.0123
 
< 0.1%
0.0226
 
< 0.1%
0.0332
 
< 0.1%
0.0424
 
< 0.1%
0.0519
 
< 0.1%
0.0614
 
< 0.1%
0.0720
 
< 0.1%
0.0810
 
< 0.1%
0.0917
 
< 0.1%
0.197
0.1%
ValueCountFrequency (%)
14.21
< 0.1%
13.31
< 0.1%
12.941
< 0.1%
12.51
< 0.1%
12.21
< 0.1%
12.141
< 0.1%
11.931
< 0.1%
11.81
< 0.1%
11.72
< 0.1%
11.691
< 0.1%

payment_type
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.3 MiB
CSH
131390 
CRD
41430 
NOC
 
343
DIS
 
53

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters519648
Distinct characters8
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCSH
2nd rowCSH
3rd rowCSH
4th rowCSH
5th rowCSH

Common Values

ValueCountFrequency (%)
CSH131390
75.9%
CRD41430
 
23.9%
NOC343
 
0.2%
DIS53
 
< 0.1%

Length

2021-07-02T18:14:51.681492image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-07-02T18:14:51.785217image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
csh131390
75.9%
crd41430
 
23.9%
noc343
 
0.2%
dis53
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
C173163
33.3%
S131443
25.3%
H131390
25.3%
D41483
 
8.0%
R41430
 
8.0%
N343
 
0.1%
O343
 
0.1%
I53
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter519648
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
C173163
33.3%
S131443
25.3%
H131390
25.3%
D41483
 
8.0%
R41430
 
8.0%
N343
 
0.1%
O343
 
0.1%
I53
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Latin519648
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
C173163
33.3%
S131443
25.3%
H131390
25.3%
D41483
 
8.0%
R41430
 
8.0%
N343
 
0.1%
O343
 
0.1%
I53
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII519648
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
C173163
33.3%
S131443
25.3%
H131390
25.3%
D41483
 
8.0%
R41430
 
8.0%
N343
 
0.1%
O343
 
0.1%
I53
 
< 0.1%

fare_amount
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct309
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7.702362368
Minimum2.5
Maximum110
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.3 MiB
2021-07-02T18:14:51.945817image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum2.5
5-th percentile3.7
Q15.3
median6.9
Q39.3
95-th percentile14.1
Maximum110
Range107.5
Interquartile range (IQR)4

Descriptive statistics

Standard deviation3.400448014
Coefficient of variation (CV)0.4414811782
Kurtosis18.65609008
Mean7.702362368
Median Absolute Deviation (MAD)2
Skewness2.324401383
Sum1334172.4
Variance11.56304669
MonotonicityNot monotonic
2021-07-02T18:14:52.112373image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
5.310648
 
6.1%
5.710341
 
6.0%
4.910213
 
5.9%
6.110199
 
5.9%
4.59575
 
5.5%
6.59479
 
5.5%
6.98879
 
5.1%
7.38268
 
4.8%
7.77736
 
4.5%
4.17720
 
4.5%
Other values (299)80158
46.3%
ValueCountFrequency (%)
2.5290
 
0.2%
2.511
 
< 0.1%
2.9620
 
0.4%
310
 
< 0.1%
3.32491
1.4%
3.418
 
< 0.1%
3.55
 
< 0.1%
3.75235
3.0%
3.8105
 
0.1%
3.912
 
< 0.1%
ValueCountFrequency (%)
1101
 
< 0.1%
841
 
< 0.1%
53.31
 
< 0.1%
52.51
 
< 0.1%
50.52
 
< 0.1%
506
 
< 0.1%
49.5717
 
< 0.1%
49.1514
 
< 0.1%
4560
< 0.1%
40.11
 
< 0.1%

tip_amount
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct487
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.394561761
Minimum0
Maximum51
Zeros132721
Zeros (%)76.6%
Negative0
Negative (%)0.0%
Memory size1.3 MiB
2021-07-02T18:14:52.514296image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile2
Maximum51
Range51
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.840789144
Coefficient of variation (CV)2.130944321
Kurtosis98.88960495
Mean0.394561761
Median Absolute Deviation (MAD)0
Skewness4.279487054
Sum68344.41
Variance0.7069263847
MonotonicityNot monotonic
2021-07-02T18:14:52.691790image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0132721
76.6%
19531
 
5.5%
29529
 
5.5%
1.52384
 
1.4%
32160
 
1.2%
1.3544
 
0.3%
0.5506
 
0.3%
1.1408
 
0.2%
4394
 
0.2%
1.4383
 
0.2%
Other values (477)14656
 
8.5%
ValueCountFrequency (%)
0132721
76.6%
0.0121
 
< 0.1%
0.0213
 
< 0.1%
0.036
 
< 0.1%
0.042
 
< 0.1%
0.052
 
< 0.1%
0.061
 
< 0.1%
0.071
 
< 0.1%
0.082
 
< 0.1%
0.092
 
< 0.1%
ValueCountFrequency (%)
511
 
< 0.1%
20.41
 
< 0.1%
206
< 0.1%
16.51
 
< 0.1%
12.551
 
< 0.1%
124
< 0.1%
11.62
 
< 0.1%
11.52
 
< 0.1%
11.41
 
< 0.1%
11.252
 
< 0.1%

tip_paid
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.3 MiB
0
132721 
1
40495 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters173216
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0132721
76.6%
140495
 
23.4%

Length

2021-07-02T18:14:52.992020image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-07-02T18:14:53.107678image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
0132721
76.6%
140495
 
23.4%

Most occurring characters

ValueCountFrequency (%)
0132721
76.6%
140495
 
23.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number173216
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0132721
76.6%
140495
 
23.4%

Most occurring scripts

ValueCountFrequency (%)
Common173216
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0132721
76.6%
140495
 
23.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII173216
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0132721
76.6%
140495
 
23.4%

rate_code_group
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.3 MiB
Others
173216 

Length

Max length6
Median length6
Mean length6
Min length6

Characters and Unicode

Total characters1039296
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowOthers
2nd rowOthers
3rd rowOthers
4th rowOthers
5th rowOthers

Common Values

ValueCountFrequency (%)
Others173216
100.0%

Length

2021-07-02T18:14:53.352055image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-07-02T18:14:53.444808image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
others173216
100.0%

Most occurring characters

ValueCountFrequency (%)
O173216
16.7%
t173216
16.7%
h173216
16.7%
e173216
16.7%
r173216
16.7%
s173216
16.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter866080
83.3%
Uppercase Letter173216
 
16.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t173216
20.0%
h173216
20.0%
e173216
20.0%
r173216
20.0%
s173216
20.0%
Uppercase Letter
ValueCountFrequency (%)
O173216
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1039296
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
O173216
16.7%
t173216
16.7%
h173216
16.7%
e173216
16.7%
r173216
16.7%
s173216
16.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII1039296
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
O173216
16.7%
t173216
16.7%
h173216
16.7%
e173216
16.7%
r173216
16.7%
s173216
16.7%

pickup_year
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing7
Missing (%)< 0.1%
Memory size1.3 MiB
2009
173155 
2008
 
54

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters692836
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2009
2nd row2009
3rd row2009
4th row2009
5th row2009

Common Values

ValueCountFrequency (%)
2009173155
> 99.9%
200854
 
< 0.1%
(Missing)7
 
< 0.1%

Length

2021-07-02T18:14:53.708103image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-07-02T18:14:53.811825image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
2009173155
> 99.9%
200854
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
0346418
50.0%
2173209
25.0%
9173155
25.0%
854
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number692836
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0346418
50.0%
2173209
25.0%
9173155
25.0%
854
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common692836
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0346418
50.0%
2173209
25.0%
9173155
25.0%
854
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII692836
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0346418
50.0%
2173209
25.0%
9173155
25.0%
854
 
< 0.1%

Interactions

2021-07-02T18:14:26.515644image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:26.753096image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:26.986438image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:27.228823image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:27.468177image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:27.701552image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:27.958836image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:28.199194image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:28.514382image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:28.737754image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:28.969133image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:29.195561image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:29.429903image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:29.654328image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:29.880316image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:30.104713image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:30.332108image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:30.545563image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:30.761955image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:30.992339image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:31.212172image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:31.435547image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:31.656988image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:31.882352image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:32.109743image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:32.341159image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:32.555557image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:32.775964image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:33.002388image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:33.227786image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:33.457171image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:33.685555image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:33.908970image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:34.130340image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:34.370695image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:34.600114image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:34.823478image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:35.053539image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:35.284920image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:35.502477image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:35.730867image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:35.947283image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:36.277212image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:36.520590image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:36.729522image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:36.978817image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:37.194245image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:37.422630image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:37.645238image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:37.861690image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:38.087055image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:38.315445image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:38.536851image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:38.745292image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:38.988643image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:39.218061image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:39.474241image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:39.715595image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:39.959942image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:40.191324image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:40.426722image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:40.659074image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:40.883473image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:41.120866image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:41.345265image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:41.561656image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:41.775086image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:42.009458image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:42.221923image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:42.438311image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:42.657724image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:42.874145image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:43.135447image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:43.349240image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:43.568626image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:43.786045image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:44.013464image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:44.232847image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:44.468251image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:44.733545image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-07-02T18:14:44.958938image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2021-07-02T18:14:53.919539image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-07-02T18:14:54.205738image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-07-02T18:14:54.533862image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-07-02T18:14:54.870958image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-07-02T18:14:55.137279image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-07-02T18:14:45.477519image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
A simple visualization of nullity by column.
2021-07-02T18:14:46.315375image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-07-02T18:14:47.022453image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

df_indexvendor_idpickup_datetimedropoff_datetimepickup_longitudepickup_latitudedropoff_longitudedropoff_latituderate_codepassenger_counttrip_distancepayment_typefare_amounttip_amounttip_paidrate_code_grouppickup_year
025VTS2009-11-23 15:45:00+00:002009-11-23 15:54:00+00:00-73.98093240.754478-73.99947840.738617NaN11.90CSH7.70.00Others2009
133CMT2009-10-10 01:29:17+00:002009-10-10 01:45:40+00:00-73.95383640.770469-73.98750040.744227NaN14.40CSH13.30.00Others2009
234VTS2009-10-01 19:31:00+00:002009-10-01 19:45:00+00:00-73.97718040.762087-73.98626840.756425NaN20.73CSH8.10.00Others2009
338VTS2009-03-20 01:53:00+00:002009-03-20 01:59:00+00:00-73.98899240.736620-73.98764740.728713NaN10.82CSH5.30.00Others2009
446VTS2009-12-14 18:34:00+00:002009-12-14 18:41:00+00:00-73.99073740.734063-73.99693040.740747NaN10.95CSH5.30.00Others2009
547VTS2009-02-27 22:55:00+00:002009-02-27 23:07:00+00:00-73.98876540.722678-73.99727740.741577NaN11.92CSH8.50.00Others2009
649VTS2009-05-31 23:20:00+00:002009-05-31 23:30:00+00:00-73.99948040.752180-73.97758740.749357NaN41.71CSH7.70.00Others2009
757DDS2009-04-16 23:36:39+00:002009-04-16 23:44:27+00:00-73.95747140.774505-73.96322840.795278NaN12.00CRD7.31.01Others2009
860CMT2009-12-10 17:48:14+00:002009-12-10 18:05:44+00:00-73.96176440.774450-73.98218740.736538NaN13.20CSH11.30.00Others2009
961CMT2009-04-25 15:36:22+00:002009-04-25 15:43:01+00:00-73.99495740.739940-73.98576840.758162NaN21.50CSH6.10.00Others2009

Last rows

df_indexvendor_idpickup_datetimedropoff_datetimepickup_longitudepickup_latitudedropoff_longitudedropoff_latituderate_codepassenger_counttrip_distancepayment_typefare_amounttip_amounttip_paidrate_code_grouppickup_year
1732061071790VTS2009-09-29 05:48:00+00:002009-09-29 05:54:00+00:00-73.99923240.743838-73.98092740.760217NaN11.59CSH6.10.00Others2009
1732071071798VTS2009-11-18 08:02:00+00:002009-11-18 08:09:00+00:00-73.99078340.755337-73.98462340.745600NaN11.18CSH5.70.00Others2009
1732081071803VTS2009-10-11 16:02:00+00:002009-10-11 16:16:00+00:00-73.95792040.763248-73.94819340.724575NaN14.38CSH12.90.00Others2009
1732091071821CMT2009-12-04 12:07:56+00:002009-12-04 12:37:52+00:00-73.97758240.746174-73.97980440.782144NaN23.30CRD16.51.01Others2009
1732101071836CMT2009-04-02 19:34:05+00:002009-04-02 19:42:52+00:00-73.99279740.749391-73.98326640.762028NaN21.40CSH7.90.00Others2009
1732111071845CMT2009-04-16 15:29:13+00:002009-04-16 15:34:40+00:00-73.97014440.748779-73.95552140.764692NaN11.50CSH6.10.00Others2009
1732121071856VTS2009-01-06 07:48:00+00:002009-01-06 07:52:00+00:00-73.99019340.775940-73.97980740.753225NaN52.94CSH9.30.00Others2009
1732131071875VTS2009-04-15 08:36:00+00:002009-04-15 08:40:00+00:00-73.97983240.781035-73.97988540.771055NaN11.16CSH4.90.00Others2009
1732141071881VTS2009-10-10 12:18:00+00:002009-10-10 12:31:00+00:00-73.96795540.762550-73.99200340.743155NaN12.36CSH9.30.00Others2009
1732151071895CMT2009-07-13 09:13:18+00:002009-07-13 09:36:28+00:00-73.96456340.760460-73.98727440.752908NaN11.80CSH12.50.00Others2009